Reporting Exact and Approximate Regular Expression Matches

نویسندگان

  • Eugene W. Myers
  • Paulo Oliva
  • Katia S. Guimarães
چکیده

While much work has been done on determining if a document or a line of a document contains an exact or approximate match to a regular expression, less e ort has been expended in formulating and determining what to report as \the match" once such a \hit" is detected. For exact regular expression pattern matching, we give algorithms for nding a longest match, all symbols involved in some match, and nding optimal submatches to tagged parts of a pattern. For approximate regular expression matching, we develop notions of what constitutes a signi cant match, give algorithms for them, and also for nding a longest match and all symbols in a match.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Regular Expression Searching with Arbitrary Integer Weights

We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/ logk s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity of the best previous techniques, but it is simpler and faster in practice. In our way, we define a n...

متن کامل

ProMiner: Organism-specific protein name detection using approximate string matching

were required, (2) disambiguation failed because of missing synonyms, e.g. ”vertebrate” and (3) for several cases the provided gold standard might be incorrect as considered abstracts describe findings in rat or human instead of mouse. Description Examples Unspecific synonym growth retarded, perinatal lethality, long lived Wrong context TGF-beta superfamily, c-myc tumors Unknown ambiguity high ...

متن کامل

Derivatives of Approximate Regular Expressions

Our aim is to construct a finite automaton recognizing the set of words that are at a bounded distance from some word of a given regular language. We define new regular operators, the similarity operators, based on a generalization of the notion of distance and we introduce the family of regular expressions extended to similarity operators, that we call AREs (Approximate Regular Expressions). W...

متن کامل

Computing Semantic Similarity between Skill Statements for Approximate Matching

This paper explores the problem of computing text similarity between verb phrases describing skilled human behavior for the purpose of finding approximate matches. Four parsers are evaluated on a large corpus of skill statements extracted from an enterprise-wide expertise taxonomy. A similarity measure utilizing common semantic role features extracted from parse trees was found superior to an i...

متن کامل

REAFUM: Representative Approximate Frequent Subgraph Mining

Noisy graph data and pattern variations are two thorny problems faced by mining frequent subgraphs. Traditional exact-matching based methods, however, only generate patterns that have enough perfect matches in the graph database. As a result, a pattern may either remain undetected or be reported as multiple (almost identical) patterns if it manifests slightly different instances in different gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998